Skip to content
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.

bug fix for issue #104 #190

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

bug fix for issue #104 #190

wants to merge 1 commit into from

Conversation

tataganesh
Copy link

@tataganesh tataganesh commented Aug 22, 2017

Sometimes, when the _load_data function is called in cmapdb.py, the following error is invoked -

File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 832, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 843, in render_contents
    self.init_resources(resources)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 347, in init_resources
    self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 195, in get_font
    font = self.get_font(None, subspec)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdfinterp.py", line 186, in get_font
    font = PDFCIDFont(self, spec)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/pdffont.py", line 668, in __init__
    self.unicode_map = CMapDB.get_unicode_map(self.cidcoding, self.cmap.is_vertical())
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/cmapdb.py", line 283, in get_unicode_map
    data = klass._load_data('to-unicode-%s' % name)
  File "/home/ganesh/.virtualenvs/cv/local/lib/python2.7/site-packages/pdfminer/cmapdb.py", line 253, in _load_data
    if os.path.exists(path):
  File "/home/ganesh/.virtualenvs/cv/lib/python2.7/genericpath.py", line 26, in exists
    os.stat(path)
TypeError: stat() argument 1 must be encoded string without null bytes, not str

The snippet in question -

    def _load_data(klass, name):
        filename = '%s.pickle.gz' % name
        logging.info('loading: %r' % name)
        cmap_paths = (os.environ.get('CMAP_PATH', '/usr/share/pdfminer/'),
                      os.path.join(os.path.dirname(__file__), 'cmap'),)
        for directory in cmap_paths:
            path = os.path.join(directory, filename)
            if os.path.exists(path):

Printing the variable name gives me -
to-unicode-PDFXC30-Identity
Printing repr(name) gives me -
to-unicode-PDFXC30-Identity\x00\x00
Apparently, these \x00 characters are causing the issue. One fix that solved this issue for me was -
name = name.replace('\0', '')

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant